report 



resumes 



ED 013 57^! 

MLA roREICN LANGUAGE PROFICIENCY TESTS FOR TEACHERS 
ADVANCED STUDENTS. 

BY- STARR, WILMARTH H. 

M«:€ERN LANGUAGE ASSN. OF AMERICA, NEW YORK, N.Y. 



FL OOfi 379 
AND 



EDRS PRICE MF-ID.SS HC-$a.60 



15F. 



PUB DATE SEP 62 



DEScRIFT«.»RS- ’:«ADVANCED students, j-'^LANGUAGE FRCf I C I ENC Y , 
❖LANGUAGE TESTS, ❖LANGUAGE TEACHERS, ❖NATIONAL COMPETENCY 
tests, french, german, ITALIAN, LANGUAGE SKILLS, RUSSIAN, 
SPANISH, STANDARDIZED TESTS, TEST RELIABILITY, TEST VALIDITY, 
TESTING FRiXRAMS, TEACHER QUALIFICATICWS, MLA FRCflCIENCY 
TESTS, EDUCATIONAL TESTING SERVICEf,' 



THE DEVEL_»FMENT AND EVALUATIC'N Cf THE MXERN LANGUAGE 
ASS«XIATlON (MLA) FOREIGN LANGUAGE FRCnCIENCY TESTS FQl 
TEACHERS AND ADVANCED STUDENTS ARE THE SUBJECTS OF THIS FINAL 
PROJECT REFC>RT. F.XLCWING AN ACCOUNT C>F THE EVENTS THAT LED 
TO THE AWARDING OF A &DVERNMENT CONTRACT TO MLA TO DEVELC«F 
NATIONALLY STANDARDIZED QUALIFICATiC’f'i TESTS AND A DESCRIPTlCiN 
LCOISiIC AND EVALUATICW PROBLEMS INVC»LVED IN 
DEVELCFING THESE TV^A:» 31 -TEST BATTERIES IN FRENCH, GERMAN, 
ITALIAN, RUSSIAN, AND SPANISH, THE REPORT APPRAISES, WITH 
SUPPORTING STATISTICAL DATA, THE HIGH LEVEL C>F RELIABILITY 
AND STATISTICAL VALIDITY OF THESE TESTS IN THE G">f1FETENCY 
AREAS OF LISTENING COMFREHENSICW, SFEAKING, READING, WRITING. 

applied linguistics, culture and civilization, and 

PROFESS laslAL FREPARATION. A CONFARI&CN OF PRE-TEST AND 
POST-TEST N«:FMS FURNISHES THE CiDVERNMENT AND THE FRCFESSICN 
WITH OBJECTIVE DATA FOF FUTURE TEST REVISIONS AND F<DR SUPPORT 
AND ANALYSIS OF INSTITUTE PROGRAMS, FOUR APPENDIXES INCLUDE 
INFORMATION AB<3UT MiXERN FOFEIGN LANGUAGE TEACHER 

qualifications, a directory of the members of the test 

CONSTRUCTIOJ COMMITTEES, AND DATA JUSTIFYING TEST RELIABILITY 
AND THE INTERCCFRELATIONS BETWEEN SKILLS. THIS ARTICLE IS A 

REPRINT FRi„N "PMLA," VOLUME 77, NUMBER 4, FART 2, SEPTEMBER 
1962» (AB) 




fv. 

tn 







r-H 



o 



Q 



PMLA 

PUBLICATIONS OF THE'MODERN'LANGUAGE'ASSOCIATION'OF'AMERICA 



Volume LXXVII 




Issued Five Times a Tear 



September 1962 



Number 4, Part 2 




MLA FOREIGN LANGUAGE PROFICIENCY TESTS FOR 
TEACHERS AND ADVANCED STUDENTS 



1 










i 

li 

i 



< ; 




po 

O 

o 







•i 

1 



I 

*1 




j 

U- 




WILMARTH H. STARR 



U.S. DEPARlMEHl OF HEALTH, EDllCATIOM & WELFARE 
OFFICE OF EDUCATIOH 



THK nnriiMFMT HAS BEEH REPRODUCED EXACTLY AS RECEIVED FROM THE 

SWIED DO HOT HtCESSWIlY REPHESEHI OfPICIAl OEEICE OE EDUCATIOH 
POSITION OR POLICY. 

Published by the MODERN LANGUAGE ASSOCIATION OF AMERICA 



Materials Center 
4 Washington Place 
New York, N.Y. 10003 




"PERMISSION TO REPRODUCE THIS 
COPYRI^HT^D^M^ERIAl HAS BEEN GRANTED 

BY_ 

operating 

U.S. OFFICE ( 

EDUCATION. FURTHER REPRODUCTION OUTSI 

S![ PERMISSION OF 

THE COPYRIGHT OWNER." 






MLA FOREIGN LANGUAGE PROFICIENCY TESTS FOR 
TEACHERS AND ADVANCED STUDENTS* 

By Wilmarth H. Starr 

Director, Modern Language Association Testing Project 



I. Brief His'ory of the Project: Since 1952, 
the Foreign Language Program of the Modern 
Language Association of America, responding to 
the national urgency with regard to foreign 
languages, has been engaged in a vigorous cam- 
paign aimed in large part at improving foreign- 
language teaching in our country. 

In 1955, as one of its activities, the Steering 
Committee of the Foreign Language Program 
formulated the “Qualifications for Secondary 
School Teachers of Modern Foreign Languages,” 
a statement which was subsequently endorsed 
for publication by the MLA Executive Council, 
by the Modern Language Committee of the 
Secondary Education Board, by the Committee 
on the Language Program of the American 
Council of Learned Societies, and by the execu- 
tive boards or councils of the following national 
and regional organizations: National Federation 
of Modern Language Teachers Associations, 
American Association of Teachers of French, 
American Association of Teachers of German, 
American Association of Teachers of Italian, 
American Association of Teachers of Spanish and 
Portuguese, American Association of Teachers of 
Slavic and East European Languages, Central 
States Modern Language Teachers Association, 
Middle States Association of Modern Language 
Teachers, New England Modern Language 
Association, Northeast Conference on the Teach- 
ing of Foreign Languages, Northwest Conference 
on Foreign Language Teaching, Philological 
Association of the Pacific Coast, Rocky Moun- 
tain Modern Language Association, South Atlan- 
tic Modern Language Association, and South- 
Central Modern Language Association. 

The statement established three general levels 
of proficiency (Minimal, Good, and Superior) for 
seven areas of language teaching competencies: 
1) aural understanding, 2) speaking, 3) reading, 
4) writing, 5) language analysis, 6) culture, 
7) professional preparation. In subsequent con- 
ferences involving national leaders representing 
the field of educational administration and na- 
tional leaders in the foreign language profession 
the need for the development of standardized 
proficiency tests as an aspect of teacher prepara- 
tion and certification was discussed and en- 
dorsed. It was obvious to many that the state- 



mi?nt of desiderata, no matter how strongly 
representative of a consensus of the profession, 
would not be as effective as the situation de- 
mande«' until nationally standardized tests could 
be developed that would implement the descrip- 
tions of competencies. 

In the spring of 1959 the means to develop 
nationally standardized qualification tests for 
teachers of foreign languages were implemented 
under a contract between the U. S. Office of 
Education (NDEA Title VI) and the Modern 
Language Association. 

Professor Wilmarth H. Starr, then Head of the 
Department of Foreign Languages and Classics 
at the University of Maine and currently Head 
of the All-University Department of Romance 
and Slavic Languages and Literatures at New 
York University, was named Project Director. 
During the summer and early fall of 1959 the 
plan of organization for the project was de- 
veloped, a preliminary study of typical existing 
foreign language tests was carried out, and the 
first meeting of the Area Committees to refine 
test objectives and to develop test specifications 
was held. 

It is important to note that the Qualifications 
Statement referred to above, as modified for use 
by the Committees (see Appendix A), became 
the guide for the range and spectrum of the test 
batteries and has remained so throughout the 
period of test development. The test batteries 
thus serve the interests of the profession as 
originally defined by the profession. 

In this respect, it was a basic assumption of the 
project proposal that the tests should essentially 
be developed by the people who use them. The 
test*: are then a product of the foreign-language 
teaching profession as illustrated by the nation- 
ally recognized names of the teachers and institu- 
tions which have been involved in the project 
since its inception (see Appendix B). In support 
of this professional attitude, which from the 
beginning called upon the best efforts of a 
nationally representative cross section of foreign- 
language scholars and teachers within the Mod- 

The final report on Contruct No. SAE 8349 submitted 
to the Acting Director of the Division of College and Uni- 
versity Assistance, Bureau of Educatioral Assistance Pro- 
grams, United States Office of Education, on 15 June 1962. 



1 



2 



MLA FL Proficiency Tests 



ern Language Association at university, college, 
and secondary-school levels, the MLA has been 
reinforced in its efforts, also from the beginning, 
by the constant and close collaboration of Educa- 
tional Testing Service, whose representatives in 
each language and area and whose testing ex- 
perts have added their resources to those of the 
professional language teacher. 

Evidence of the objectivity with which we 
have approached our task is the way in which the 
name of the test batteries has been changed 
during the course of the test development. B'rom 
the original concept of Qualifications Tests for 
Secondary School Teachers of Foreign Languages, 
there has been a series of modifications to 
Qualifications Tests for Teachers of Modern 
Foreign Languages to the present title. Foreign 
Language Proficiency Tests for Teachers and 
Advanced Students. In the first place it became 
apparent that the word “qualifications” sug- 
gested impingement upon the rights of States 
and individual institutions to determine their 
own qualification criteria, since it seemed to 
imply a tacit assumption that the MLA or the 
government was undertaking to “impose” qual- 
ifications standards. The proper view implicit in 
the new title is that the test batteries are sensi- 
tive instruments to measure proficiency as related 
to national norms, but the presumption of 
“qualifications” is clearly left to the agencies 
established for such purposes. 

In the second place, it has become equally 
clear that important uses can be made of the 
tests in terms of proficiency measurement for 
advanced students in teacher-training programs, 
in M.A. and Ph.D. programs, as guides for place- 
ment and diagnostic purposes or as indicators of 
achievement. 

Needless to say, the tests themselves have also 
undergone an elaborate process of evolution. The 
62 preliminary tests developed in more than 75 
meetings of test construction committees in 



1959-60 were considerably longer than necessary 
for even the high reliability required of them. 
This was because we did not wish to pre-judge 
sensitive problems in connection with speaking 
and writing and because we wished to experiment 
with a maximum of item types from which we 
could then select the best on the basis of data 
analysis. In the summer and fall of 1960 over 
30,000 individual preliminary tests were admin- 
istered and over 26,000 were scored. As a result 
we were provided with an enormous amount of 
data for revision purposes and at the sa me time 
were able to give significant information about 
the test population in 37 NDEA Summer In- 
stitutes, 5 Academic-Year Institutes, and selected 
control groups in the Carnegie Inter-University 
programs and the Middlebury Italian Summer 
School. 

Apart from the considerable logistics problem 
involved in printing, coding, and shipping more 
than three tons of test materials to various cen- 
ters of administration in this country and abroad, 
major problems were associated with the fact 
that scoring teams had to be trained not only to 
use the electronic equipment effectively, but to 
control the sensitive evaluation processes estab- 
lished for this pioneering venture. It must be 
pointed out that there was no previous experi- 
ence to draw from which was predictably ap- 
plicable to a five-language, oral-production oper- 
ation on the scale of our project. Similar prob- 
lems were experienced in connection with the 
scoring of that part of the writing tests not sub- 
ject to machine scoring. 

In connection with the scoring of tests, which 
required human scorers, it is worth noting, as a 
testimonial of professional loyalty, that signifi- 
cant numbers of trained scorers from the 1960 
and 1961 scoring sessions will return as the 
nucleus for the 1962 scoring teams. 

In the fall and winter of 1960-61, the data 
compiled by Educational Testing Service was 



Test 



Actual Testing Time Actual Testing Time 

in minutes in minutes 

Preliminary Forms Revised Forms 



Listening Comprehension 

Speaking 

Reading 

Writing 

Applied Linguistics 
Culture-Civilization 
Professional Preparation 



40 


20 


40 


15 


70 


40 


90 


45 


40 


40 


60 


30 


70 


45 



410 = 

6 hrs. 50 min. 



235 = 

3 hrs. 55 min. 



Saving in minutes 



20 

25 

30 

45 

30 

25 



175 = 

2 hrs. 55 min. 



Wilmarth H. Starr 



3 



explained to and analyzed by the committee 
chairmen in a series of seven meetings prelimi- 
nary to final form revisions. As a result, it was 
possible to develop final batteries of predictably 
high reliability and discriminatory power with 
substantially sharpened, refined, and shortened 
tests. The final forms were ready for administra- 
tion in the summer of 1961 to the 75 Summer and 
Academic-Year NDEA Institutes, an additional 
control group studying in Mexico (University of 
Arizona) and two groups going to Russia (Indi- 
ana University and the University of Michigan). 

The preceding table illustrates actual testing 
time saved by the final forms over the prelimin- 
ary forms. The length of the revised forms was 
determined by the number of items necessary to 
insure figures for reliability and discrimination 
consistent with good testing practices and by the 
minimum number of parts necessary to produce 
sophisticated coverage of the testing problem. 

In the summer and early fall of 1961 over 
43,000 individual tests, including over 7,000 
speaking tests, were administered, scored, and 
reported. The testing population included the 68 
Summer and 7 Academic-Year NDEA Institutes. 
The test results were analyzed and the findings 
published by Educational Testing Service. 

II. The Tests: The contractual obligation to 
develop two thirty-one test batteries in five 
languages (French, German, Italian, Russian, 
Spanish) covering the seven competencies 
(Listening Comprehension, Speaking, Reading, 
Writing, Applied Linguistics, Culture-Civiliza- 
tion, Professional Preparation) has been com- 
pleted according to the time schedule delineated 
in the contract. What is more important is the 
quality of the tests and their value to the foreign- 
language profession and the educational com- 
munity. At this point in time it is perhaps best 
to limit ourselves to what is statistically verifi- 
able in answering the questions implicit in the 
preceding sentence. At the original conferences 
of the MLA chairmen and the ETS advisors to 
set the specifications for the tests it wai agreed 
to aim at individual test reliability of ,80 and 
battery reliability of .90 in accordance with 
accepted testing procedures. It may be clearly 
stated that in every case the minimum require- 
ments for validity and reliability have been 
appreciably exceeded. Responsible officers of 
Educational Testing Service have stated that 
these MLA test batteries are among the most 
valid and reliable test batteries with which the 
ETS has ever been associated. 

Where human scorers are concerned, as in the 
Speaking and Writing tests, one of the critical 



problems is scorer reliability, since no tests can 
ever be more reliable in effective use than the 
scoring processes applied to them. In this con- 
text, we can report that, on the basis of scorer 
reliability checks for the Writing tests, we have 
achieved a figure of .996, which is about as good 
as can be obtained with machine scoring. A 
special problem has arisen in this regard in con- 
nection with the Speaking tests. In the first 
place, ETS has not computed the statistical re- 
liability of the individual Speaking tests because 
every item on a given tape has been rated by a 
single rater and there is certain to be a “halo” 
effect because the scorer is agreeing with himself. 
All indications, however, are that we are obtain- 
ing high scorer reliability and plans are now in 
process to test the theory by having three differ- 
ent people score three different parts without 
hearing the other parts and with pre- and post- 
tests mixed in without the scorers’ knowledge. A 
sample study (see Appendix C) for French shows 
a high reliability of .93 between the A and B 
forms and a correlation of .82 between pre- and 
post-tests. In addition, statistical analysis based 
on 1,336 cases shows a multiple correlation figure 
of .83 for the prediction of performance on the 
Speaking test from the scores of the Listening, 
Reading, and Writing tests. All indications, there- 
fore, are that we are dealing with a reliable test 
in tffis new area of oral productioii measurement. 
A second problem that occurred particularly 
with the 1960 preliminary forms was the fact that 
a significant number of participants received 
lower scores on post-tests than on pre-tests. It is 
not possible to suppose retrogression on the part 
of the number of cases involved and our conclu- 
sion, after multiple checking of all statistics, was 
that the scorers became gradually more rigorous 
as scoring proceeded. This effect was practically 
eliminated in 1961 and we assume that we have 
the matter under control. The following table 
gives the reliability figures available at present 
for the individual tests. 





Form A 


Form B 


French 


Listening 


.9J7 


.912 


Reading 


.970 


.930 


Writing 


.935 


.942 


Linguistics 


.872 


.863 


Culture-Civilization 


.858 


.868 


Spanish 


Listening 


.911 


.908 


Reading 


.913 


.899 


Writing 


.944 


.948 


Linguistics 


.837 


.849 


Culture-Civilization 


.889 


.886 



MLA FL Proficiency Tests 



Italian 


Form A 


Form B 


German (based on 297 cases) 




Listening 


.845 


.862 


Listening 


R=.83 


SEest = 4.90 


Reading 


.935 


.867 


Speaking 


R=.83 


SEest=10.89 


Writing 


.965 


950 


Reading 


R=.90 


SEest = 4.85 


Linguistics 


.845 


.781 


Writing 


R=.90 


SEest = 7.08 


Culture-Civilization 


.889 


.829 














Russian (based on 176 cases; 




German 






Listening 


R=.87 


SEest = 3.26 


Listening 


.921 


.890 


Speaking 


R=.84 


SEest=10.38 


Reading 


.942 


.928 


Reading 


R=.80 


SEest = 6.50 


Writing 


.968 


.964 


Writing 


R=.81 


SEest = 9.37 



Linguistics .898 

Culture-Civilization . 882 

Russian 

Listening .815 

Reading .928 

Writing .962 

Linguistics .818 

Culture-Civilization .857 

Professional Preparation . 861 



.887 

.866 



.910 

.919 

.962 

.855 

.845 

.874 



Appendix C is a table which shows further 
reliability figures between forms and correlations 
for pre- and post-testing. The tests were carefully 
spiraled at all administrations to insure statis- 
tical validity. 

Appendix D is a table which shows the inter- 
correlations between the competencies measured. 
It is interesting to note that there is a fairly high 
correlation between the four skills and indeed 
formulas have been derived which will permit 
with fair reliability prediction of any one skill 
performance from the other three. The correla- 
tions are not high enough, however, for us to 
recommend omission of any one test. In all cases 
the standard error is greatest for the Speaking 
and Writing tests and least for Listening and 
Reading. In the following table showing the 
predictability of performance by skills based on 
performance in the other three skills R is the 
correlation between predicted and observed 
values. SEest is the standard deviation of the 
difference between the predicted and observed 
values. The number of cases is too small in 
Italian to produce meaningful values. 



French (based on 1,336 cases) 

Listening R = . 85 

Speaking R = . 83 

Reading R=.89 

Writing R=.89 

Spanish (bj.sed on 1,334 cases) 



Listening 

Speaking 

Reading 

Writing 



R=.86 

R=.78 

R=.89 

R=.90 



SEest= 4.73 
SEest =11. 72 
SEest = 5.02 
SEest = 6.03 



SEest = 4.10 
SEest =12.98 
SEest= 4.61 
SEest = 5.90 



Our conclusion, therefore, is that the correla- 
tions are not high enough and the standard 
errors are too large for us to recommend the 
exclusion of any one test particularly when the 
reliability of the individual skill tests is so high 
as to insure reasonably accurate measurement of 
the individual skills in themselves of such inter- 
est to our profession at this time. 

One trend indicated in Appendix D of particu- 
lar interest is the relatively low correlation of the 
Professional Preparation test with the other 
competencies. Since the forms were spiraled and 
both were about equally involved in pre- and 
post-testing, the conclusion must be that this 
trend is not a function of the test. From the 
point of view of testing objectives this means that 
in this area we are measuring something quite 
different from the other areas, yet admittedly it is 
one of considerable significance to our profession. 

The high level of reliability and statistical 
validity of the tests, a battery reliability of over 
.90 for each of the languages, is, in conclusion of 
this section, the point we take pleasure in under- 
lining again. If we assume that the language pro- 
fession is to concern itself with proficiency meas- 
urement, and all indications are that such is the 
case, then it may be said that project SAE 8349 
has provided the sensitive instruments with 
which to make such measurement. 

III. y/hai the Tests Are Telling Us: The wide 
employment of the tests in the Summer and 
Academic-Year NDEA Institutes, in preliminary 
form to provide us with data for revision and in 
final form to provide us with normative data, 
has played a double role, for it furnishes the 
government and the profession with objective 
data which supports and justifies the Institute 
program as well as criteria for analyzing the 
effectiveness of individual Institutes. It is 
satisfying to note that the Institute population, 
which together with control groups, Indiana and 
Michigan Study Groups and the Associated 
Colleges of the Mid-West project provided 
2,866 examinees and data from 40,124 tests, in 
the summer of 1961 made measurable progress 



O 

ERIC 



m 



Wilmarth H. Starr 



5 



in all five languages and in all seven competencies 
without exception as illustrated in the following 
table. 

Pre-Test and Post-Test Means Compared 



French (converted scores) 





Pre- 


Post- 


Gain 


Listening 


M 39.0 


al. 3 


3.3 


Speaking 


M 78.5 


84.1 


5.6 


Reading 


M 44.5 


46.6 


2.1 


Writing 


M 45.2 


46.4 


1.2 


Applied I.inguistics 


M 45.5 


50.7 


5.2 


Culture-Civilization 


M 44.3 


49.7 


5.4 


Professional Prep. 


M 59.1 


66.9 


7.8 


German (converted scores) 






Pre- 


Post- 


Gain 


Listening 


M 38.9 


al.! 


3.3 


Speaking 


M 81.2 


84.5 


3.3 


Reading 


M 45.3 


48.7 


3.4 


Writing 


M 46.3 


49.0 


2.7 


Applied Linguistics 


M 48.1 


53.9 


5.8 


Culture-Civiliza don 


M 48.6 


53.9 


5.3 


Professional Prep. 


M 60.2 


67.3 


7.1 


Spanish (converted scores) 






Pre- 


Post- 


Gain 


Listening 


M 38.4 


41.6 


3.2 


Speaking 


M 73.1 


77.5 


4.4 


Reading 


M 41.5 


44.1 


2.6 


Writing 


M 45.2 


48.7 


3.5 


Applied Linguistics 


M 43.5 


49.0 


5.5 


Culture-Civilization 


M 49.5 


56.3 


6.8 


Professional Prep. 


M 58.8 


66.8 


8.0 


Russian (converted 


scores) 






Pre- 


Post- 


Gain 


Listening 


M 39.2 


43. 1 


3.9 


Speaking 


M 72.0 


79.5 


7.5 


Reading 


M 34.7 


38.9 


4.2 


Writing 


M 50.6 


57.0 


6.4 


Applied Linguistics 


M 44.2 


48.6 


4.4 


Culture-Civilization 


M 49.4 


53.1 


3.7 


Professional Prep. 


M 59.7 


64.4 


4.7 


Italian (raw scores) 






Pre- 


Post- 


Gain 


Listening 


M 20.3 


22.6 


2.3 


Speaking 


M 63.3 


71.0 


7.4 


Reading 


M 26.8 


30.6 


3.8 


Writing 


M 31.0 


34.4 


3.4 


Applied Linguistics 


M 27.7 


31.0 


3.3 


Culture-Civili zation 


M 30.0 


33.5 


3.5 


Professional Prep. 


M 33.0 


41.3 


8.3 



Furthermore we have provided all data to 
measure individual participant scores against 
Institute and national scores and to measure the 



individual Institute mean score gains against the 
national mean score gains. For norming purposes 
all data have been converted to percentile equiv- 
alents. 

The above table illustrates some interesting 
trends. It indicates, for example, that the great- 
est mean gains throughout the Institutes were 
made in the areas of Applied Linguistics, Cul- 
ture-Civilization, and Professional Preparation. 
It is unquestionably an indication of the fact 
that the population was least knowledgeable 
about these areas to begin with and hence could 
show more dramatic progress, but it is also signifi- 
cant in terms of the purposes of the Institutes, 
which are in part to emphasize these areas. In the 
non-skill areas, greatest gains were consistently 
made in the area of Professional Preparation. In 
this area, which we have observed not to have 
high correlation with the others, it can be stated 
that the Institute Program is having significant 
impact indeed. Of the skill tests, the Speaking 
competency generally shows the most appreci- 
able gains, a fact which is also significant in terms 
of this major purpose of the Institute program. 
The fact that Writing generally shows least gain 
of the skills is probably an indication that it re- 
ceives less emphasis than the other skills. A use 
of the tests is thus pointed up in that Directors 
will be able to decide on the basis of objective 
data where new emphases need to be placed. 

In a recent administration to a native-speaking 
group, it was observed that although high scores 
were obtained in the skill areas, no advantage to 
the native speaker could be observed in the three 
non-skill areas. In fact the scores in these areas 
were significantly low. Assuming again the pro- 
fessional validity of the “qualifications” criteria 
which guided the construction of the tests, our 
conclusion may well be that native speakers 
need the Institute program or special programs 
emphasizing the non-skill areas as a part of their 
training for teaching foreign languages in this 
country. 

JV. Impact and the Future: Easily two hun- 
dred individuals, as members of the test construc- 
tion committees, as scorers,^ as voices on tapes, 

‘ The scorers, who were carefully selected through inter- 
views for their linguistic training and skills, met together for 
orientation training at th'' beginning of each scoring session. 
The scorers for French were: Marcella Buxbaum, Carolyn 
Goldberg, Dennis Healy, Josd Huertas-Jourda, Lillian B. 
Jeanpierre, Wendell A. Jeanpierre, Fred Myers, Cecile Nebel 
Marie Louise Pesselier, Rizel Pincus, Annette Schwartzberg, 
Carolyn Strauss, Marcel Wallace; for German: Carl Buch- 
man, R. Travis Hardaway, Margaret Mong, Senta Stiefel; 
for Italian: Donatella Careccia, Marcia Cobourn; for 
Russian: Alexander Chlopoff, Irene Gendzier, George Holen- 



6 



MLA FL Proficiency Tests 



and as consultants, have participated in the 
project. They represent all teaching ranks in 
many different institutions at all levels through- 
out the country. This fact alone has not only 
disseminated the purposes of the tests and in- 
formation about them, but it has involved many 
people in proficiency testing as a function of the 
teaching process. The devotion and high quality 
of work which has characterized the efforts of all 
from the beginning is a testimonial to the sense 
of professionalism existing among language 
teachers. 

In connection with his MLA duties, the Direc- 
tor has been called upon to write several articles, 
to give numerous speeches to professional groups, 
and to attend a number of conferences concerning 
testing in %other disciplines. A list follows as an 
indication of the type of dissemination involved. 

Articles 

Illinois Educational Press Bulletin, December 1959 
ETS Developments, VIII 1, October 1959 
pm: * . May 1961 
MLJ, lu preparation 
Speeches 

Foreign Language Association of Northern Cali- 
fornia, Stanford University 
ETS Invitational Conference, New York 
Education Seminar, University of Maine 
Annual Meeting MLA, Philadelphia 
Central States Modern Language Teachers Associa- 
tion, Cleveland 

Conferences addressed on the subject of the tests 
The Council on Cooperation in Teacher Education, 
Washington, D. C. 

The National Commission on Accrediting, Washing- 
ton, D. C. 

The National Council for Accreditation of Teacher 
Education, Washington, D. C. 

Conference 22 (Teacher Training Curricula in the 
Foreign Language Field), MLA Annual Meetings, 
Chicago, 111. 

Institute Directors Meetings, Chicago and Boulder 
Regional TEPS Conference, Boston 
Cooperative Classroom Testing Project, Princeton 
Center for Applied Linguistics Conference on Test- 
ing Common and Less Common Languages, 
Washington, D. C. 

ETS-AGS Conference on Testing for Graduate 
Language Requirements, Princeton 
Conferences on Testing English as a Foreign Lan- 
guage, Washington, D. C. 

Post Hoc Ergo Propter Hoc? Whether or not 
there is a direct relationship between the fore- 
going activitiei. and subsequent ones, it has been 
clear to the Director that there has been steadily 
increasing interest in the concept of proficiency 
testing. A case in point is the following quotation 



from the annual report of the Council on Cooper- 
ation in Teacher Education in 1960. “The Coun- 
cil on Cooperation in Teacher Education recom- 
mends that research and development in pro- 
ficiency examinations of all kinds be encouraged 
and subsidized. The Modern Language Associa- 
tion in proposing requirements and in developing 
tests for assessing competency in the teaching of 
foreign languages has set an example that may 
be followed by other academic and professional 
disciplines.” The Director is also currently 
Chairman of the French Committee for the As- 
sociation of Graduate Schools’ project to de- 
velop standardized tests for use in graduate 
schools in connection with language require- 
ments. He is serving on the National Advisory 
Council for a project on the Testing of English as 
a Foreign Language and is exploring, at the re- 
quest of several government agencies, the problem 
of tests in common and less common languages. 
It seems evident to the Director that the MLA 
project, as a first of its kind in the foreign- 
language field, has created new interest in the 
development of standardized tests for the meas- 
urement of foreign-language skills through hav- 
ing demonstrated the fact that reliable instru- 
ments can be built which are consonant with the 
uses of foreign languages in today’s world. 

By far the greatest impact, however, derives 
from the numbers of people to whom the tests 
have been administered. By the end of the 
1962-63 academic year we estimate that nearly 
10,000 individuals will have been tested and more 
than 132,000 tests administered. The following 
table illustrates the uses of the tests to that date. 

Figures on Institutes, Participants, and Tests 

Summer 1960 

37 Summer Institutes 

2,154 Examinees (including Carnegie Inter-Uni- 
versity Program and Middlebury Italian 
School) 

28,625 Tests Administered 
Academic Year 1960-61 

5 Academic Year Institutes 
108 Examinees 

1,512 Tests Administered 
Summer 1961 

67 Summer Institutes 

2 , 866 Examinees (including Indiana and Michigan 
Study Groups, and Associated Colleges of 
the Midwest) 



koff. Rose Lefel, Natalia Sukacev, for Spanish: Ethel 
Arcilagos, Lucia Bonilla, Mary Cannizzo, Vincent Durkin, 
Victor Fuentes, Antonio Gila, Juan Lopez, Margaret Mc- 
Evoy, Rizel Pincus, Mrs. Stanley Redka. 



Wilmarth H. Starr 



7 



40,124 Tests Administered 
Academic Year 1961-62 

7 Academic Year Institutes 
162 Participants 

2,268 Tests Administered and Scored 
Summer 1962 

79 Summer Institutes 
4,028 Examinees 
56,392 Tests to be Administered 
Academic Year 1962-63 

5 Academic Year Institutes 
119 Examinees 

1,666 Tests to be Administered 
Miscellaneous Small Programs in which we are already 
involved 

11 Institutions or Agencies 
421 Number of Examinees to Date 
2,105 Number of Tests to Date 



1960-62 

1962-63 



Grand Totals 

Examinees Tests Administered 
5,290 72,529 

4,568 60,163 



Total 



9,858 132,692 



Such widespread use cannot help but have an 
impact upon language teachers in terms of an 
increased sense of professionalism and in the 
profit which derives from the identification of 
strengths and weaknesses. 

In addition, we have had over 200 requests 
from State and Local Boards of Education, from 
various institutions and agencies, from depart- 
ments of foreign languages and from individuals. 
In anticipation of the ongoing interest, the Office 
of Education has this spring granted permission 
to the MLA for a ten-year period “to reproduce, 
administer, distribute in a manner consistent 
with test security, and otherwise to exploit in the 
public interest, the Foreign Language Proficiency 
Tests for Teachers and Advanced Students in- 
cluding any revisions thereof.” The MLA has in 
consequence entered into a contract with ETS 
for a two-year period to insure their professional 
assistance and continuing collaboration in dis- 



tribution, administration, and scoring of the tests 
as well as to secure their help in making revisions 
and new forms. 

The Director is pleased to report that the 
following Institutions and Agencies are already 
using the tests or have contracted for their use: 
Hampton Institute (for undergraduate majors) ; 
University of Massachusetts; Emmanuel Mis- 
sionary College, Michigan; Indiana University 
(Russian Study Group); Oberlin (French Study 
Group in France), (Spanish Study Group in 
Mexico); Associated Colleges of the Mid-West 
(Saint Olaf, Beloit, Monmouth, Lawrence, Coe); 
State of Pennsylvania; State of New Hampshire; 
State of Delaware; Washington D. C. (Local 
Board)., In addition, the following States are 
seriously exploring the possibility: Vermont, 
Hawaii, Massachusetts. 

Reports from the users indicate enthusiastic 
satisfaction and it seems reasonable to be optim- 
istic about the gradual increase in participation. 
An eight-page brochure describing the tests in 
some detail is planned for fall distribution. 

V. Concluding Remarks: No one is more aware 
than the Project Director that such success as 
may be ascribed to the completion of the project 
is due to the cooperation and competence of all 
those who have participated. He would be remiss, 
however, if he did not name for special mention 
George Winchester Stone, Jr., Donald D. Walsh, 
Allan Hubbell, and Harry Alonso of the MLA 
Stafl’, and Robert Solomon and Mrs. Miriam 
Bryan of ETS. The project could not have been 
completed without their constant support. The 
Director is embarrassed only by the fact that 
space does not allow for the listing of the many 
others to whom major credit is due in both organ- 
izations. He feels, too, that it must be stated 
again that the MLA-ETS collaboration has been 
most successful and mutually reinforcing. It has 
been an experience that is a tribute to the pro- 
fessional quality and attitudes of both organiza- 
tions. 



i 




8 



MLA FL Proficiency Tests 



Appendix A. Qualifications for Teachers of Modern Foreign Languages 



competence 


SUPERIOR 


GOOD 


MINIMAL 


Listening 

Q)mprehension 


Ability to follow closely and with 
ease all types of standard speech, 
such as rapid or group conversation 
and mechanically transmitted speech. 


Ability to understand conversa- 
tion of normal tempo, lectures, and 
news broadcasts. 


Ability to get the sense of what an 
educated native says when he is 
making a special effort to be under- 
stood and when he is speaking on a 
general and familiar subject. 


Speaking 


Ability to speak fluently, approxi- 
mating native speech in vocabu- 
lary, intonation, and pronunciation. 
AbiUty to exchange ideas and to be 
at ease in social situations. 


Ability to talk with a native with- 
out making glaring mistakes, and 
with a command of vocabulary 
and syntax sufficient to express 
one’s thoughts in conversation at 
normal speed with reasonably good 
pronunciation. 


Ability to read aloud and to talk 
on prepared topics (e.g., for class- 
room situations) without obvious 
faltering, and to use the common 
expressions needed for getting 
around in the foreign country, 
speaking with a pronunciation un- 
derstandable to a native. 


Reading 


Ability to read almost as easily as 
in English material of considerable 
difficulty. 


Ability to read with immediate 
comprehension prose and verse of 
average difficulty and mature con- 
tent. 


Ability to grasp directly (i.e. with- 
out translating) the meaning of 
simple, non-technical prose, except 
for an occasional word. 


Writing 


Ability to write on a variety of sub- 
jects with idiomatic naturalness, 
ease of expression, and some feeling 
for the style of the language. 


Ability to write a simple “free 
composition” such as a letter, with 
clarity and correctness in vocabu- 
lary, idiom, and syntax. 


Ability to write correctly sentences 
or paragraphs such as would be de- 
veloped orally for classroom situa- 
tions and to write a simple descrip- 
tion or message without glaring er- 
rors. 


Applied 

Lingtiistics 


The “good ’ level of competency 
with additional knowledge of de- 
scriptive, comparative, and histori- 
cal linguistics. 


The “minimal” level of compe- 
tency with additional knowledge 
of the development and present 
characteristics of the language. 


Ability to apply to language teach- 
ing an understanding of the differ- 
ences in the sound system, forms, 
and structures of the foreign lan- 
guage and English. 


Culture and 
Civilization 


An enlightened understanding of 
the foreign people and their culture, 
such as is achieved through per- 
sonal contact, through travel and 
residence abroad, through study of 
systematic descriptions of the for- 
eign culture, and through study of 
literature and the arts. 


The “minimal” level of compe- 
tency with first-hand knowledge 
of some literary masterpieces and 
acquaintance with the geography, 
history, art, social customs, and 
contemporary civilization of the 
foreign people. 


An awareness of language as an es- 
sentia! element of culture and an 
understanding of the principal ways 
in which the foreign culture differs 
from our own. 


Professional 

Preparation 


A mastery of recognized teaching 
methods, evidence of breadth and 
depth of professional outlook, and 
the ability to experiment with and 
evaluate new methods and tech- 
niques. 


“Minimal” level of competency 
plus knowledge of the use of spe- 
cialized techniques, such as audio- 
visual aids, and of the relation of 
language teaching to other areas 
of the curriculum. Ability to eval- 
uate the professional literature of 
foreign language teaching. 


Knowledge of the present-day ob- 
jectives of the teaching of foreign 
languages as communication and 
an understanding of the methods 
and techniques for attaining these 
objectives. 



N.B. The names of the seven competencies were also slightly modified and appear in the test batteries as listed here. 



Wihnarth H, Starr 



9 



French 

Spanish 

Italian 

German 

Russian 

French 

Spanish 

Italian 

German 

Russian 

French 

Spanish 

Italian 

German 

Russian 



Appendix B. Test Construction Committees 



LISTENING COMPREHENSION 



Edward Geary, Chairman 
Alain Seznec 
Edmond Meras 
Patricia O’Connor, Chairman 
Sol Saporta 
Filomena Peloro 
James Ferrigno, Chairman 
Carlo Vacca 
Rigo Mignani 
Jack Stein, Chairman 
Walter Lohnes 
Hugo Schmidt 

Richard Burgi, Chairman (1959-60) 
Rostislav Rozdestvensky (1959-60), 
Chairman (1960-61) 

Nina Berberova-Kochevitsky 



Harvard 

Cornell 

Phillips Exeter 

Brown 

Washington 

Mat. Dev. Center, NYC 

Massachusetts 

Wellesley (Mass.) H.S. 

Harpur 

Harvard 

Phillips Academy 
Bryn Mawr 
Yale 

Glastonbury (Conn.) Sch. 
New Haven, Conn. 



James lannucci. Chairman 
Frederic St. Aubyn 
Annette Emgarth 
Stanley Sapon, Chairman 
Edward Allen 
Chris Nacci (1959-60) 

Sandra Scharff (1960-61) 

Robert Politzer, Chairman 
Peter Fodale 
Fred Bosco 

Herbert Penzl, Chairman 
Mary Crichton 
Max Dufner 

William Edgerton, Chairman (1959-60) 
Horace Dewey (1959-61), Chairman (1960- 
Nonna Shaw 



St. Josephs 
Delaware 
Dover, Delaware 
Ohio State 

Univ. School, Ohio State 
Capital 
Ohio State 
Michigan 
Michigan 
Michigan 
Michigan 
Michigan 
Michigan 
Indiana 
■61) Michigan 
Indiana 



READING 



Linn Edsall, Chairman 
Jane Bourque 

Philip Wadsworth (1959-60) 

Paula Thibault (1960-61) 

Frederick Agard, Chairman (1959-60) 

Dalai Brenes (1959-61), Chairman (1960-61) 
Katherine Whitmore 
Norma Fornaciari, Chairman (1959-60) 
Clarence Turner (1959-61), Chairman 
(1960-61) 

Maria Piccirilli 
Guido Guarino (1960-61) 

C. R. Goedsche, Chairman 
Werner F Jlman 
Meno Spann 

Assya Humesky, Chairman 
Horace Dewey (1959-60) 

Dale Winkels (1959-60) 

Clayton Dawson (1960-61) 

Nicholas Karateew (1960-61) 



Wayne State 
Madison, Conn. 
Illinois 

Detroit, Mich. 

Cornell 

Cornell 

Smith 

Roosevelt 

Rutgers 

Vassar 

Rutgers 

Northwestern 

Princeton 

Northwestern 

Syracuse 

Michigan 

Michigan 

Syracuse 

Syracuse 







i 

'i 











10 

French 

Spanish 

Italian 

German 

Russian 

French 

Spanish 

Italian 

German 

Russian 

French 

Spanish 

Italian 

German 

Russian 



MLA FL Proficiency Tests 



Nelson Brooks, Chairman 
Pierre Capretz 
Gordon Christopher 
Elizabeth Nicholas, Chairman 
Jeannette Atkins 
Jaime Muirden 
Robert Serafino, Chairman 
Bianca Calabresi 
Arthur Selvi 

Joseph Reichard, Chairman 
Edith Runge 
Walter Lohnes 

Horaf , Lunt, Chairman (1959-60) 
BayL ra Tschirwa (1959-61), Chairman 
(1960-61) 

Dmitry Grigorieff 
Marina Prochoroff (1960-61) 



WRITING 

Yale 

Yale 

Hillhouse H.S., New Haven 
Mat. Dev. Center, NYC 
Staples K.S., Westport, Conn. 
New Haven, Conn. 

State Dept, of Ed., Conn. 
Albertus Magnus 
Central Conn. State Coll. 
Oberlin 

Mount Holyoke 
Phillips Andover 
Harvard 
Harvard 

Columbia 

Mat. Dev. Center, NYC 



APPLIED LINGUISTICS 



Robert Politzer, Chairman 
Harry Bratnober 
Albert Vladman 
Fernand Marty 
Sol Saporta, Chairman 
Patricia O’Connor 
Mary Temperly (1960-61) 

Ismael Silva-Fuenzalida (1959-60) 
Eidward Williamson, Chairman 
Anthony Pellegrini 
Salvatore Castiglione (1959-60) 
Ernest Pulgram (1960-61) 
Freeman Twaddell, Chairman 
R. M. S. Heffner 
William Moulton 
William Cornyn, Chairman 
Vladmir Petrov 
Howard Garey 



Michigan 

Macalester 

Indiana 

Hollins College 

Washington 

Brown 

Illinois 

Foreign Service Institute School of Languages 

Wesleyan 

Vassar 

Georgetown 

Michigan 

Brown 

Wisconsin 

Princeton 

Yale 

Yale 

Yale 



Georges May, Chairman 
Joseph Stookins 
Kenneth Cornell 
Theodore An^ersson, Chairman 
Miguel Enguidanos 
Andrea McHenry 
Charles Speroni, Chairman 
Aido Scaglione 
Gaetano Pomposo 
Else Fleissner, Chairman 
Karl Koenig 
Anthony Schepsis 
Leon Stilman, Chairman (1959-60) 
William Harkins, Chairman (1960-61) 
Peter Juviler (1959-60) 

Mrs. Edward C. Bill (1959-60) 
Henry Morton (1960-61) 

Francis Randall (1960-61) 



CULTURE-CIVILIZATION 

Yale 
Loomis 
Yale 
Texas 
Texas 

Houston Schools 
UCLA 

UC (Berkeley) 
Pittsburgh (Calif.) H.S. 
Wells 
Colgate 

Utica Free Academy 
Columbia 
Columbia 
Hunter 
Princeton 
Queens 
Columbia 






Wilmarth H. Starr 



11 






ij 




PROFESSIONAL PREPARATION 



Alfred Pellegrino, Chairman 
Germaine Cressey 
Mary Thompson 
George Scherer 
Emma Birkmaier 
Meyer Krakowski (1959-60) 



(Italian) 

(French) 

(Spanish) 

(German) 

(Russian) 



Maine 

Montclair 

Glastonbury, Conn. 

Colorado 

Minnesota 

L. A. City College 



Appendix C. Reliability of MLA Foreign Language Proficiency Tests 



BETWEEN A 
AND B FORMS 



PRE- AND POST- 
TEST CORRELATIONS 



FRENCH 



Listening 


.91 


.84 


Speaking 


.93 


.82 


Reading 


.94 


.88 


Writing 


.87 


.78 


Cult.-Civ. 


.86 


.74 


Prof. Prep. 




.67 




GERMAN 




Listening 


.90 


.84 


Speaking 




.81 


Reading 


.94 


.90 


Writing 


.97 


.92 


App. Ling. 


.89 


.85 


Cult.-Civ. 


.87 


.78 


Prof. Prep. 




.64 




SPANISH 




Listening 


.90 


.86 


Speaking 




.79 


Reading 


.90 


.88 


* Not estimated because of limited number of cas 



Writing 
App. Ling. 
Cult.-Civ. 
Prof. Prep. 



Listening 

Speaking 

Reading 

Writing 

Cult.-Civ. 

Prof. Prep. 



Listening 

Speaking 

Reading 

Writing 

App. Ling. 

Cult.-Civ. 

Prof. Prep. 



BETWEEN A 
AND B FORMS 

.95 

.84 

.89 



PRE- AND POST- 
TEST CORRELATIONS 

.90 

.78 

.78 

.70 



ITALIAN 



.85 

.90 

.96 

.81 

.86 



.86 

.92 

.96 

.84 

.85 



RUSSIAN 



.80 

.79 

.85 

.91 

.77 

.75 

.67 







12 MLA FL Proficiency Tests 



Appendix D. Intercorrelations Between Skills 

FRENCH 





LISTENING 


SPEAKING 


READING WRITING 


LING. 


CULT.-CIV. 


PROF. PREP 


Listening 




.784 


.800 


.782 


.566 


.540 


.317 


Speaking 


.784 




.739 


.804 


.523 


.487 


.269 


Reading 


.800 


.739 




.859 


.638 


.643 


.371 


Writing 


.781 


.804 


.858 




.660 


.564 


.344 


Linguistics 


.566 


.523 


.638 


.660 




.592 


.548 


Cult.-Civ. 


.540 


.487 


.634 


.564 


.592 




.477 


Prof. Prep. 


.317 


.279 


.371 

GERMAN 


.344 


.548 


.477 




Listening 




.727 


.817 


.781 


.509 


.512 


.230 


Speaking 


.727 




.742 


.787 


.551 


.403 


.206 


Reading 


.817 


.742 




.860 


.624 


.614 


.274 


Writing 


.781 


.787 


.860 




.738 


.629 


.326 


Linguistics 


.509 


.551 


.624 


.738 




.670 


.460 


Cult.-Civ. 


.512 


.403 


.614 


.629 


.670 




.455 


Prof. Prep. 


.230 


.206 


.274 

SPANISH 


.326 


.460 


.455 




Listening 




.925 


.797 


.796 


.479 


.599 


.371 


Speaking 


.925 




.691 


.724 


.412 


.525 


.293 


Reading 


.797 


.691 




.857 


.568 


.687 


.430 


Writing 


.796 


.724 


.857 




.625 


.677 


.467 


Linguistics 


.479 


.412 


.568 


.625 




.589 


.616 


Cult.-Civ. 


.599 


.525 


.687 


.677 


.589 




.520 


Prof. Prep. 


.371 .293 .430 

ITALIAN 

(Intercorrelations Impossible Because 

RUSSIAN 


.467 .616 .520 

of Small Number of Cases) 




Listening 




.743 


.770 


.746 


.480 


.408 


.238 


Speaking 


.743 




.697 


.727 


.436 


.304 


.160 


Reading 


.770 


.697 




.669 


.367 


.328 


.078 


Writing 


.746 


.727 


.669 




.708 


.480 


.298 


Linguistics 


.480 


.436 


.367 


.708 




.506 


.511 


Cult.-Civ. 


.408 


.304 


.328 


.480 


.506 




.408 


Prof. Prep. 


.238 


.160 


.078 


.298 


.511 


.408 





i 




